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Abstract. Consider a stationary real- valued time series {X n }^L with a priori unknown 
distribution. The goal is to estimate the conditional expectation E(X n +i\Xo, . . . , X n ) 
based on the observations (Xo, . . . , X n ) in a pointwise consistent way. It is well known that 
this is not possible at all values of n. We will estimate it along stopping times. 
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Introduction and Statement of Results 

Suppose the distribution of the real- valued stationary time series {X n }^ =0 is not 
known a priori. The goal is to estimate the conditional expectation E{X n+ i \Xq, . . . , X n ) 
from the data segment Xq, . . . , X n such that the difference between the estimate and the 
conditional expectation should tend to zero almost surely as the number of observations 
n tends to infinity. This problem (for binary time series) was introduced in Cover (1975). 
When one is obliged to estimate for all n, Bailey (1976) and Ryahko (1988) proved the 
nonexistence of such a universal algorithm even over the class of all stationary and ergodic 
binary time series. 

In a special case, for certain Gaussian processes, Schafer (2002) constructed an algo- 
rithm which can estimate the conditional expectation for every time instance n. 

For further reading on related topics cf. Ornstein (1978), Algoet (1992), (1999), 
Morvai Yakowitz and Algoet (1997), Morvai, Yakowitz and Gyorfi (1996), Gyorfi, Lugosi 
and Morvai (1999), Gyorfi and Lugosi (2002), Weiss (2000) and Gyorfi et al. (2002). 

In this paper we do not require to estimate for every time instance n, but rather, merely 
along a sequence of stopping times. That is, looking at the data segment Xo, . . . , X n our 
rule will decide if we estimate for this n or not, but anyhow we will definitely estimate 
for infinitely many n. Algorithms of this kind were proposed for binary time series in 
Morvai (2003) and Morvai and Weiss (2003). 

We will consider two-sided real-valued processes {X n }^' = _ 00 . A one-sided stationary 
time series {X n }^L Q can always be considered to be a two-sided stationary time series 

Let 5ft be the set of all real numbers and put 5ft*~ the set of all one-sided sequences of 
real numbers, that is, 

5ft*~ ={(..., X-i,xq) : Xi E for all — oo < i < 0}. 
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Define the metric d*(-, •) on 5ft* as 

-i \x-i-y-i\ 



d*{{. ..,x- 1 ,x ),(...,y- 1 ,y )) = J2 2 1 X \ 



- + \x-i-y- 



Definition:. The conditional expectation E{X\\ . . . ,X_i,Xq) is almost surely contin- 
uous if for some set B C 5ft*~ which has probability one the conditional expectation 
E(X\ \ . . . , X-i, Xq) restricted to this set B is continuous with respect to metric d* (■,■). 

Now we introduce our algorithm. For notational convenience, let X™ x — (X m , . . . , X n ), 
where m < n. Define the nested sequence of partitions {"Pfe}^L °^ the rea ^ nne as follows. 
Let 

V k = {[i2- k ,(i + l)2- k ) : fori = 0,1,-1,2,-2,...}. 

Let x — ► [x] k denote a quantizer that assigns to any point x G 5ft the unique interval in 
V k that contains x. Let [X"] fc = {[X m ] k , [X n ] k ). 

We define the stopping times {A„} along which we will estimate. Set Ao = 0. For 
n = 1,2,..., define A„ recursively. Let 

\ n = A„_! + min{t > : LY t A - 1+t ]" = [X^T ■ (1) 

Note that A„ > n and it is a stopping time on [X^°] n . Let : Vk — ► 5ft denote a function 
that assigns to any cell A E Vk & point in A. The nth estimate m n is defined as 



n— 1 

w « = -E/^+iP)- (2) 

j=o 

Observe that m n depends solely on LYq™] h . This estimator can be viewed as a sam- 
pled version of the predictor in Morvai, Yakowitz and Gyorfi (1996), Weiss (2000), Al- 
goet (1999) and Gyorfi et al. (2002). 

Define the time series {X„}" = _ 00 as 

X_„ = lim X\ n for n > 0, (3) 

where the limit exists since the intervals {[X\ j - n \ : '}'jL n are nested and their lengths tend 
to zero. 

Define the function e : 5ft*~ — ► (— oo, oo) as 

e{x _ oo ) = E(X 1 \X°_ oo = x°_ oo ). 
We will prove the following theorem. 
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Theorem. Let {X n } be a real-valued stationary time series with E(\X \ 2 ) < oo. Then 
almost surely 

lim m„ = lim E(X Xn+1 \[X^] n ) = e^^) 



lim 

n— *oo 



m n - E(X Xn+1 \[X^] n ) 



0. 



Moreover, if in addition the conditional expectation E(Xi\X _ oo ) is almost surely contin- 
uous then almost surely 



lim 

n — >oo 



m n -E(X Xn+1 \X^) 



0. 



Unfortunately, there is a stationary and ergodic Markov chain {X n } taking values from 
a countable subset of the unit interval such that 



P limsup 



m„ - E(X Xn+1 \X^) 



> > 0. 



Remarks. 

Let {X n } be a real-valued stationary time series with ^(IXol 2 ) < oo. If the distribu- 
tion of Xq happens to concetrate on finitely many atoms then 

E(X Xn+1 \[X^] n ) = E(X Xn+1 \X^) eventually 

and so \m n — E(X Xn+ i\X^ n )\ — > almost surely, without any continuity condition. 

Let {X n } be a real-valued stationary time series with E(\X \ 2 ) < oo. If one knows 
in advance that the distribution of Xq concentrates on finite or countably infinite atoms 
then one may omit the partition Vk, the quantizer [•]* and the function /&(■) entirely. 
That is, one may define Aq = and for n = 1, 2, . . . set 



X' n = A;_ x + min{t > : X^ +t = X^} 



and 



^ n—i 

3=0 

Then 



lim 

n—>oc 



m' n -E(X K+1 \X^) 



= almost surely 



without any continuity condition. Particularly, m' n works for the counterexample process 
in the third part of the Theorem. 

The counterexample Markov chain in the third part of the Theorem of course will not 
possess almost surely continuous conditional expectation E(X\ \X^_ OG ). 

From the proof of Bailey (1976), Ryabko (1988), Gyorfi, Morvai, Yakowitz (1998) it is 
clear that even for the class of all stationary and ergodic binary time series with almost 
surely continuous conditional expectation E(X 1 |X° oc ) one can not estimate E(X n+ i \Xq) 
for all n in a pointwise consistent way. 
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Proofs 

It will be useful to define other processes {-£n fc) }£L_oo for k > as follows. Let 

X (k) n = X Xk _ n for - oo < n < oo. (4) 

For an arbitrary real-valued stationary time series {Y n }, let AofXi^) = and for n > 1 
define 

A„(y° co ) = A„_ 1 (y co )-min{t>0:[lY* i J" = J"}- 

A n _ 1 —1 /Vn — 1 

Let T denote the left shift operator, that is, (Tx^^i = aJi+i- It is easy to see that if 
A„(x- oc ) = I then XniT'x^) = -I. 

Proof of the Theorem. 

Step 1. We show that for arbitrary k > 0, the time series {X^}'%L_ 00 and {X n }'£L_ 00 
have identical distribution. 

It is enough to show that for all k > 0, m > n > 0, and Borel set F C 5R™ +1 , 
P((li fc l„, . . .,*£)) G F) = P(X™_„ 6 F). 

This is immediate by stationarity of {X n } and by the fact that for all k > 0, m > n > 0, 

Z > 0, F C 5R" +1 , 

r'{*&£_ n = = {X™_„ e F, A^J = -Z}. 

Step 2. We show that for k > 0, almost surely, 

and 

Since we are dealing with a nested sequence of partitions and X k depends solely on the 
fcth quantized sequence, it is enough to prove that for any i > and for all j > i, almost 
surely [X^ +1 = [X U) l Y+ 1 . (Note that Xj(X$°) - j > 0.) If AT_ 4 ft [X U) l Y+ 1 for some 
j > i then this must happen at a right end-point of some interval in \J k x L Q T'k- By (3) 
and Step 1, we have 

1 - P(X^ e [X U }] 3+1 for all j > i) 

OO 00 

< Yl p( - x - 1 = s2 ~ k i xU l < x -i for a11 j > k ) 



k—i s— — oc 
oo oo 



<V V lim P(s2- k - 2-3 < X^l < s2- k ) 



k—% s— — oc 



oo oo 



J2 lim p ( s2 ~ k - < x -i < s2 ~ k ) 
fc= 

0. 



k— % s— — oc 
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Step 3. We show that the distributions of {X n }° l= _ oc and {X n } l= _ oo are the same. 
This is immediate from Step 1 and Step 2. 

The time series {X n } 1= _ oo is stationary, since {X n }^__ oc is stationary, and it can be 
extended to be a two-sided time series {X n }^L_ OQ . We will use this fact only for the 
purpose of defining the conditional expectation i^X^X^). 

Step 4. We prove the first part of the Theorem. 

Consider 

71 — 1 

w « = n E (M*a,+iP) - E(fj([X Xj+1 y)\[X^Y)) 

3=0 
1 n—1 

+ - E (mmi+iYWo'Y) e(x Xi+1 \[x^y)) 

3=0 
1 n — 1 

+ -£^(*vhi|[*o'F')- (5) 

3=0 

Observe that {Tj = fj([Xx i+ i] j ) - £ , (/ 7 -([X Aj . + i]^')|[Xo 3 p')} is a sequence of orthogonal 
random variables with ETj = and £ (T 2 ) < £ (|Xl| 2 ) + 2E\X 1 \ + 1 since E (T 2 ) < 
_E (jA^+il 2 ) + 2i?|^rAj+i | + 1 and, by Step 1, X\ j+1 has the same distribution as X\. 
Now by Theorem 3.2.2 in Rcvesz (1968), 

n—1 

— ^ Tj — > almost surely. 

The second term tends to zero since |/j([^A j +i]-') — -Xa.,+i| < 2~ 3 . Now we deal with 
the third term. By Step 2, Step 1 and Step 3, 

E(X Xj+1 \lX*Y) = EfrftXl^oJ*). 

The latter forms a martingale and by Theorem 7.6.2 in Ash (1972), almost surely, 

E(X Xj+1 \[X^Y) = EiX^Xl^J) - EiX.lX^). (6) 

By (5) and (6), almost surely, 

lim m n = E(X 1 \X _ oo ). (7) 

n^oo 

Thus the first part of the Theorem is proved. 
Step 5. We prove the second part of the Theorem. 
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By (7) it is enough to prove that almost surely E(Xx j +i\X : ') — > E(X\ \ X°_ oc ) provided 
that E(Xi\X _ oo ) is almost surely continuous. By assumption, the function e(-) is con- 
tinuous on a set B C sft*- with P(X ( l oc , G B) = 1. By Step 1 and Step 3, 



P(XO co G S, (. . . , X« , if } ) G B for all j > 0) = 1. 



Let 



A/}(X A >) = {z^ G : G [X P, . . . , z G [x Xj y}. 
By (4), (8) and Step 2, almost surely, for all j, 

(. . . , X«\ , X« ) G Afj (X* ) p| B and A>° ^ G ./V} ) f| B. 

Put 

e,(^)= sup |e(/ 00 )-e(^ 00 )|. 

y -oo^ -oo6Ar,(X ^)ns 

Since e(-) is continuous on set B and by (9), almost surely, 

lim Qj(Xp) = 0. 

j— »oo 

By (9) and (10), almost surely, 

E (eiX^X^Y) E (e(. . .,X%X^)\X^) 

< limsupE ( £ (e^)^]') - e(. . . X«) |X A ') 

limsup^fe^XoOl^oO 



lim sup 

j— >oo 



< 



(8) 



(9) 



(10) 



= lim sup 8 j (X 3 ) 

j— >oo 

= 0. (11) 

By Step 2, 

p(x Aj+1 |^)=£;( e (i oo )|[i« j F) 

- {e (e^)^']') - £ (e(. . 
The first term tends to e(X° co ) by the almost sure martingale convergence theorem 



< E 



E\XA < oo. 



(cf.Theorem 7.6.2 in Ash (1972)) since by Step 3, E e{X _ oo ) 

The second term tends to zero by (11). The proof of the second part of the Theorem is 
complete. 

Step 6. We prove the third part of the Theorem. 

First we define a Markov chain {M n } on the nonnegative integers which will serve as 
a technical tool for our counterexample process. Let the transition probabilities be as 
follows. 



P{M 1 = 0|M = 0) = P(Mi = 1|M = 0) = P(Mi = 0|M = 1) = 2" 1 
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and for i = 2, 3, . . . , let 

P(Mi = i|M = 1) = T { and P(Mj = 0|M = i) = 1. 

All other transitions happen with probability zero. Note that one can reach state 1 only 
from state 0. It is easy to see that the Markov chain just defined yields a stationary 
and ergodic time series with initial probabilities P(M = 0) = |, P(M = 1) = f , and 
for i = 2, 3, . .. P(Mo = i) = jjt^t- Our counterexample process {A„} will be a one 
to one function of the Markov chain {M n }. Define the function h : {0, 1, 2, . . . } — ► 5f as 

/i(0) = 0, h(l) = 1 and for i > 2 put /i(i) = Let X„ = h(M n ). Since ft(-) is one to 

one, {X n } is also a Markov chain. Since {X n } has the same distribution as {X n }, {X n } 
is also a Markov chain. Let 

A n = {h{i) : h(i) < 2-(™ +1 ) for i = 0, 1, 2, . . . }. 

Note that h(i) G A„ if and only if [h(i)] n+1 = [0]™ +1 . Define the event 

H = {X = 0,X' = (0,1)}. 

Observe: If X\ = 1 then A" = 0. (State 1 can be reached only from state 0.) The event 

{X = 0} happens if and only if X\ n G A n for all n = 1,2, Since [ft-(O)] 1 = [h(i)] 1 

for i > 2 and for all > 0, [/i(l)] fc ^ [M*)] fe provided i ^ 1 the event {A > _ 1 = 1} occurs 
if and only if X x = 1. It follows that 

ff = {X = 0, X 1 = 1, X A „ e A, for n = 1, 2, . . . } = {X° 2 = (0, 1,0)}. 

Since the time series {X n } has the same distribution as {X n }, 

P(H) = P(X a _ 2 = (0,1,0)) = iii = i>o. 

It will be enough to show that A"a„ G A„ — {0} happens infinitely often given the condition 
H since if X\ n eA„- {0} happens then A"a„+i = and by (7), on H 

m n EiX^Xa = 0) = 0.5 

and so 

P ( Um sup |m„ - £(X A „ +1 |X A ")| = 0.5|ff J = 1 

and P(H) > 0. To prove that {X\ n G A„ — {0}} occurs infinitely often we need the 
following observation for repeated use: By the Markov property and the construction 
in (1) if Xi G A4 for i = 1, 2, . . . , j then for j > 1, 

P(X Xj = xAXl = (0,1), X Xm = x m for 1 < m < j) = = Xj \X = 1,X 1 G A,-_i). 

(12) 

Indeed, for j = 1 this is trivial, since X\ = 1 implies that Xq = 0, Ai = 2 while Xo = 1 
implies that A"i G A- For j > 2 set ify = Aj_i — 1 and for i > 1 the ^ will be the 
successive occurrences of the block [Xq 1 ^ 1 X p in the j-th quantization, defined by 
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These ipf are stopping times for i = 1,2, ... . Temporarily let Dj denote the event 

{Xl = (0, 1), X Xm = x m for 1 < m < j}. 

The way that Xj is defined means that on Dj if Xj occurs at the i-th repetition of 
[Xq 3 ^ 1 1 ] j it is because ip{ < Xj and X^j +1 G It follows that 

oo 

P(X Xj = xjlDj) = Y. P ^l+i = x A X ^+i e A i-*M < h,Dj)PU4 + 1 = \i\Dj). 
i=i 

Since Xj G Aj C each expression P(X^3 +1 = Xj\X^ 3+1 G Aj_i,i$ < Xj,Dj) can 

be written as 



P(X. i+1 = Xj W <Xj,Dj 



P(X^ +1 eAj^W <Xj,Dj) 
and then by decomposition according to the value I of ipf we get 

P(X^ +1 =x 3 \^ <X 3 ,D,) 

= E ( nx l+ . = xM = i<^) P ^ = lx e A |W < A D A . 

feVPW + ieil J _i|^=/<A i> iJ J ) ^ J ^ J ' J V 

Observe that X^j = 1 provided X\ = 1 and the event {t/^ < Aj} is measurable with 
respect to cr([X *p). Now by the Markov property we get 



- G A^IX, = 1) P(X^ +1 G Aj_i\ipl < Xj,Dj) 

By stationarity and since iEj G Aj C A,-_i, 



=gjjgj = 1) 
P(X i+1 6^-11^ = 1) 

Combining all this we get 



P(X 1 =x j \X 1 £Aj- 1 ,X = l). 



P(X Xj =x j \D j ) 

= P(X 1 =x j \X 1 eA j _ 1 ,X = l) 

^ P ( X ^!+i ei HK <*j,Dj) , 

OO 

= P(Xi - x^X, G A^x.Xo = 1)E P (^' + 1 = X i\Di) 

i=l 

= P(X 1 = x j \X 1 eA j - 1 ,X = l) 
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and we have proved (12). 

In order to show that the events 



{X Xn G A n — {0}} 



occur infinitely often we prove that they have sufficiently large conditional probabili- 
ties and they are conditionally independent given the condition H. First we calculate 
P(X Xn G A n - {0}\H). For n > 2, by (12), 

P(X Xn GA n -{0}\H) 

P({X Xn eA n -{0}}nH) 
P(H) 

_ P{Xx n G An - = (0, l),X Xj G A 3 for 1 < j < n) 

P{X Xn G A n \Xi = (0, 1), X Xj G Aj for 1 < j < n) 
-fj P(X Xm G A m \X^ = (0, l),X Xn G A„. - {0},X Xj G Aj for 1 < j < m) 
' m i„+i e A ™ \ X o - (0, l),X Xj G for 1 < j < m) 

= P(X Xn G An - {0}\X^ = (0, l),X Xj G A, for 1 < j < n) 

P(X Xn G A n \Xl = (0, 1), X Xj G for 1 < j < n) 
> P(X Xn G A n - {0}\X^ = (0, l),X Xj G Aj for 1 < j < n) 
= P(X 1 G A n ,X 1 ^0\X = 1,X 1 G A n _x) 
>P(Xi G A n ,X! ^0|X = 1) 

- E 1 

•^-^ 2* 

ieA n -{o} 



= E 1 



2 l 

i>log 2 (n) 



1 

> -. 

n 

We have just proved that 

VP(I A „eA„-{0}|ff)>Vi = ». (13) 

n n 

Now we will prove that for n = 1, 2, . . . , the events E A n — {0}} are conditionally 

independent given iJ. Since 

P(X Ai G Ai — {0} for i = 1,2,..., k\H) 

= E E * , (*A«=z i fori = l,2,...,fc|iO 

KieAi-{o} i t 64-{o} 

it is enough to show that the events {X^ = Xi\ are conditionally independent given the 
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condition H, provided that Xi £ Ai. Let x% E Ai. Then by repeated use of (12) 

P(X Xt =Xi fori = 1,2,..., k\H) 
_ P(X Xi = x, for i = 1, 2, . . . , k, H) 
P(H) 

= ( n p(XAm = Xm ^ = (0 ' l),XXi = Xj for 1 ~ 3 K m) ^ 



n 



^ v P(X Am G A m |Xi = (0, 1),X A . e Aj for 1 < j < m) 

P(A\ g MXl = (0, l),X Aj g Aj for 1 < j < l) \ 
L +i P(X Aj g = (0, 1),X A . g Aj for 1 < j < I)) 

P(X Xm = zJXq 1 = (0, 1), A% g Aj for 1 < j < m) 



^ V^P(^A m G Anl^o 1 = (0. 1).^ G Aj for 1 < j < m) 

P(A\ g Aj\Xl = (0, l),X Am = x m ,X X] £ A,- for 1 < j < I) 
t l P(X Xl e A t \Xi = (0, l),X Xj e Aj for 1 < j < I) 



•4 ^(^A m G Anl^o 1 = (°- 1).^ G A* for l<j<m)j 

P{X Xl g Ai \X% = (0, 1), X Xi = x,iorl<i<k and X Xj g A,- for 1 < j < Z) 
_ x P(A\ i Ipi = (0, 1), A% i A, for 1 < j < I) 

A P(X Xm = x m \Xl = (0, 1),X X] e A for 1 < j < m) 
iij ^(*A m G Anl^o 1 = (0, 1),*A, G A,- for 1 < j < m) 

* ^P(X Am = zJXq 1 = (0, 1), X A . g Aj for 1 < j < m) 

m— ! 

oo 

n 

-m-\ 
k 

n 

m— . 

oo 

n 

77 

_ A Ppf Al =X a ,ff) 

- fi P(20 

= n p (^ 4 =xi\H). 

i=l 

Now by (13) and the Borcl-Cantclli lemma (cf. Lemma B in Rcnyi (1970) on page 390) 
the events {X Xn g A n — {0}} occur infinitely often and the third part of the Theorem is 
proved. The proof of the Theorem is complete. 
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